Exploring Financial Time Series Data

In this notebook, we explore the application of Independent Component Analysis on the top 28 S&P 500 stock returns from 11/06 to 11/16. We provide the data at https://github.com/Freshdachs/FinData .

In [247]:
import numpy as np
import pandas as pd
import datetime
import plotly.offline as py
import plotly.graph_objs as go
from sklearn.decomposition import FastICA, PCA
from itertools import islice
import functools

py.init_notebook_mode(connected=True)

plot = lambda d,lbls:py.iplot([go.Scatter(x=d.index,y=d[i],name=i) for i in lbls])

#tuple monad
unit_t = lambda x: x if isinstance(x,(list,tuple)) else (x,)  
bind_t = lambda f,m: f(*m)

#single value monad
#unit_s = lambda x

def iterate(f,*x,unit=unit_t,bind=bind_t):
    while 1:
        x = bind(f,x)
        yield x
        x = unit(x)
        
def nth(n,it):
    return islice(it,n)[-1]
In [283]:
#load timeseries
df=pd.read_csv('tss3.csv', sep=';',index_col=0,parse_dates=True,decimal=',')
#sort columns by sum
df = df[df.columns[np.argsort(df.apply(sum).values)][::-1]]

Data

We use the daily adjusted closing prices of 441 stocks on 2458 day from 2011-11-1 to 2016-11-1 out of the S&P 500. We retrieved the data from Quandl and cleaned it up, so that only stocks and days are remaining where we have complete data available.

In [4]:
def pre_diff(df):
    d=df.copy()
    d.values[1:] =df.values[1:]-df.values[:-1]
    return (d.drop([d.index[0]]),df[0:1])

def pre_mean(df):
    d=df.copy()
    return (d.apply(lambda i: i-np.mean(i)),d.apply(np.mean))

def pre_normalize(df):
    d=df.copy()
    fac=np.max(np.max(np.abs(d)))
    return (d.apply(lambda i: i/fac), fac)

def pre_process(df):
    d, base=pre_diff(df)
    d, means = pre_mean(d)
    d, fac=pre_normalize(d)
    return (d,base,means,fac)
In [5]:
pre_df, base,means,fac = pre_process(df)
In [114]:
idcs = np.argsort(df.apply(max).values)
#lbls = [pre_df.columns[i] for i in idcs[-8:]]
#lbls+['MMM']
lbls = df.columns[:8].values

Differential, Normalized Returns

Below we compare the differential, normalized returns with the regular returns.

In [116]:
plot(pre_df,lbls)
plot(df,lbls)

ICA

in this section, we apply the ICA approach to our data.

In [246]:
# load ICA and fit our data
ica = FastICA(whiten=False,max_iter=1000)

#reduce datasize
d, base,means,fac = pre_process(df[df.columns[:28]])
In [265]:
# compute sources and invert-transformed signal S from dataframe d
def ica_d(d):
    ica_d = ica.fit(d)
    S = ica_d.transform(d)
    return (S, ica_d.mixing_, ica_d.inverse_transform(S))
S,A,x = ica_d(d)
In [241]:
def reconstruct(df,x,base,means,fac):
    d=df.copy()
    x = x*fac+means.values
    d.values[:] = (np.cumsum(x,0)+base.values)[:]
    return base.append(d)
In [282]:
def ranking(S,A):
     return np.apply_along_axis(lambda row: np.argsort(np.apply_along_axis(max,0,np.abs(S*row))),1,A)
In [278]:
def indices(R,i=0,n=8):
    return (R[i][-n:],R[i][:n])

def sep_ics(S,A,R,i=0, n=8):
    return list(np.dot(S[:,f],A[:,f].T)  for f in indices(R,i,n))


#S_red = S[:,first_four]
#A_red = A[:,first_four]
#X_red = np.dot(S_red,A_red.T)

def to_df(d,v):
    e=d.copy()
    e.values[:]=v[:]
    return e
In [244]:
plot(to_df(d,S),d.columns[indices(R)].values)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-244-547386dedd9a> in <module>()
----> 1 plot(to_df(d,S),d.columns[indices(R)].values)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py in __getitem__(self, key)
   1402 
   1403         key = _values_from_object(key)
-> 1404         result = getitem(key)
   1405         if not is_scalar(result):
   1406             return promote(result)

IndexError: too many indices for array
In [279]:
R=ranking(S,A)
X_maj, X_min =sep_ics(S,A,R)
rec_df = reconstruct(d,X_maj,base,means,fac)
min_df = reconstruct(d,X_min,base,means,fac)
plot(rec_df,lbls)
plot(min_df,lbls)
plot(df,lbls)

Here we see that both the minor components and the major components can be reconstructed to similar approximations. We see upwards trends in both our components, how can the be? We substract the mean from our data during the preprocessing. This means that even with a 0 signal from our reconstruction, we still add the mean during the reconstruction. Since we reconstruct the derivate, this addition translate to a constant slope.

This shows that we can successfully perform an ICA decomposition and back again.

PCA

In this section, we analyze what results a PCA approach yields.

In [276]:
pca = PCA()

# compute sources and invert-transformed signal S from dataframe d
def pca_d(d):
    pca_d = pca.fit(d)
    S=pca_d.transform(d)
    return (S, pca_d.components_.T, pca_d.inverse_transform(S))
S,A,x = pca_d(d)
In [281]:
R = ranking(S,A)
X_maj, X_min =sep_ics(S,A,R)
rec_df = reconstruct(d,X_maj,base,means,fac)
min_df = reconstruct(d,X_min,base,means,fac)
plot(rec_df,lbls)
plot(min_df,lbls)
plot(df,lbls)

Misc.

Random collection of code snippets + more.

In [490]:
topsources = lambda S,n: np.argsort( np.apply_along_axis(lambda x:np.max(np.abs(x)),0,S))[-n:]
topsources(S,8)
Out[490]:
array([410, 152, 386, 158, 183, 354,  14, 202])
In [491]:
#show top
py.iplot([go.Scatter(y=S[i]) for i in topsources(S,8)])
In [484]:
np.argsort( np.apply_along_axis(np.max,0,S))
Out[484]:
array([354,  14, 314, 257, 227, 122, 316,  64, 234, 298, 167, 261, 307,
       293, 399, 151, 336, 436, 353, 131, 364,   2, 211, 109, 343, 144,
        34,  85,  81, 290, 240, 352, 394, 169, 415, 347, 411, 440, 337,
       258, 135,   1, 393, 365, 186, 378, 114,  98, 113, 263, 196, 308,
       198, 332,  28, 388, 315, 162,  93,  17, 136,  47,  78, 344, 181,
       310,  35, 164, 271, 387,  49, 161, 236, 346, 155, 260, 304, 292,
       158,  46,  18, 133, 323, 341, 157, 107, 241, 141, 395, 432, 324,
       383,  45, 418, 312,  76,  55, 207, 342,  27, 239, 423,   5, 372,
       246, 160, 278, 249, 380, 357, 385, 194, 129, 424, 402, 428, 140,
       193, 437, 280, 218,  74, 369, 214,  83, 259, 412, 281, 105, 371,
        92,  24, 178, 374, 287, 153, 201, 191, 254, 209,  44, 104, 250,
       272, 244, 302, 205, 226, 138,  30, 128,   9, 318, 300, 309, 130,
        86, 384,  41,  95, 230,  22, 126, 338, 275, 143,  59, 243, 274,
       363, 123, 232, 148, 417,  48, 409, 184, 118, 377, 361,  16,  11,
        50, 320, 190,  19, 416, 267, 283,  60, 245, 367, 220,   6, 266,
       223, 233, 121,  67, 154, 165,  33, 403, 362, 340, 185, 396, 111,
       295, 116, 156, 146, 405, 359,  75, 334, 286, 204, 390, 322, 195,
       262, 317,  82, 427, 398, 313, 197,  15, 291, 335, 431,  69, 333,
       366,  61,  13, 251, 210, 124, 331, 264, 208, 279, 179, 339, 199,
       187, 173, 110, 419, 400,  26,  79,  51,  91, 174,   3, 350,  57,
        12,  43,   8, 430, 434, 277, 276, 225, 426, 319, 255, 303,  84,
       108, 401,  29,  70, 327,  65, 177, 119,  36, 404, 435, 102, 139,
       397, 112, 355, 429,  54, 311, 106, 200,  31,   7, 206, 326,  23,
       420, 373, 192, 147, 382, 252, 132, 219, 242, 376, 370,  58, 356,
       391, 358,  56, 172, 349, 168, 299, 120,  32,  96, 248,  62, 231,
       433,   4, 289, 229,  20, 189, 288, 100, 297, 127,  38, 268, 247,
       351, 217,  37,  68,  21, 150,  97, 125,  10, 235, 149,  52,  77,
       256,  87, 282,  88, 379, 137, 439, 296, 213, 176, 360, 306, 166,
        89, 171, 103, 237,  94, 238, 134,  72, 222, 182, 273, 188, 389,
        40,  25, 228, 175, 421, 221, 329, 345, 413, 215, 159,  53,  71,
        63,  66, 142, 115, 422, 325,  90, 408, 216, 101, 381,  80, 163,
       368, 414, 212,  73, 265, 253, 224,   0, 180, 170, 321, 425, 438,
       305, 375, 269, 270, 117, 145, 348, 406, 330, 284, 301, 392,  42,
       407, 285, 294, 328,  39, 203,  99, 410, 152, 386, 183, 202])